Skip to main content

Course set solutions – thematic course on contextual data part 1 - family data

The script below shows how to work with family data. This was reviewed in our theme course, which was run twice in 2022. The theme course also demonstrates how to work with contextual municipal data. Check separate example for this.

Click here for more about our courses.

 // Connecting to the data bank
require no.ssb.fdb:23 as db

textblock
1) Aggregate from individual data to family data
------------------------------------------------
endblock

// First, create an individual dataset for individuals in families consisting of married couples with young children
create-dataset individual_data1
import db/BEFOLKNING_REGSTAT_FAMTYP 2019-01-01 as family_type
tabulate family_type
keep if family_type == '2.1.1'

// Add demographic information
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
generate age = 2019 - int(birth_year_month/100)
import db/BEFOLKNING_BARN_I_REGSTAT_FAMNR 2019-01-01 as num_children

// Create dataset for generating total work income per family => unit = family
create-dataset family_data1
import db/BEFOLKNING_REGSTAT_FAMNR 2019-01-01 as family_number
import db/INNTEKT_WYRKINNT 2019-01-01 as work_income
collapse (sum) work_income, by(family_number)
rename work_income family_income

// Merge family income into individual dataset (unit = individuals)
merge family_income into individual_data1 on PERSONID_1

// Create family statistics. The family number consists of the individual id of the oldest individual in the family, so when you remove individuals with missing family income you are left with a dataset with family as the unit. All individual information will then apply to the oldest individual in the family
use individual_data1
drop if sysmiss(family_income)
rename age oldest_age

histogram oldest_age, discrete
histogram num_children, discrete percent
tabulate num_children

summarize family_income
barchart (mean) family_income, by(num_children)
histogram family_income, freq
histogram family_income, by(num_children) percent


textblock
2) Construct family data and link to individual dataset
----------------------------------------------------
endblock

// First, create an individual dataset for individuals in families consisting of married couples with young children
create-dataset individual_data2
import db/BEFOLKNING_REGSTAT_FAMTYP 2019-01-01 as family_type
import db/BEFOLKNING_REGSTAT_FAMNR 2019-01-01 as family_number
tabulate family_type
keep if family_type == '2.1.1'

// Add various demographic information
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
generate age = 2019 - int(birth_year_month/100)
import db/BEFOLKNING_BARN_I_REGSTAT_FAMNR 2019-01-01 as num_children

// Create dataset for generating total work income per family => unit = family
create-dataset family_data2
import db/BEFOLKNING_REGSTAT_FAMNR 2019-01-01 as family_number
import db/INNTEKT_WYRKINNT 2019-01-01 as work_income
collapse (sum) work_income, by(family_number)
rename work_income family_income

// Merge family income into individual dataset (unit = individuals)
merge family_income into individual_data2 on family_number

// Create individual and family statistics
use individual_data2
histogram age, discrete
histogram num_children, discrete percent
tabulate num_children

summarize family_income
barchart (mean) family_income, by(num_children)
histogram family_income, freq
histogram family_income, by(num_children) percent


textblock
3) Linking parent data to individual data
-----------------------------------
endblock

// Create an individual dataset with links to father and mother
create-dataset individual_data3
import db/INNTEKT_WYRKINNT 2019-01-01 as income
import db/BEFOLKNING_KJOENN as gender
import db/NUDB_BU 2019-01-01 as education
import db/BEFOLKNING_FAR_FNR as father_id
import db/BEFOLKNING_MOR_FNR as mother_id

// Get information about parents and link to individual dataset
create-dataset parent_data3
import db/INNTEKT_WYRKINNT 2019-01-01 as father_income
import db/NUDB_BU 2019-01-01 as father_education
clone-variables father_income -> mother_income
clone-variables father_education -> mother_education

merge father_income father_education into individual_data3 on father_id
merge mother_income mother_education into individual_data3 on mother_id

// Run a simple linear regression to test the relationship between own and parents' income
use individual_data3
generate male = 0
replace male = 1 if gender == '1'

destring education
generate high_education = 0
replace high_education = 1 if education >= 700000 & education < 900000
replace high_education = education if sysmiss(education)

destring father_education
generate high_father_education = 0
replace high_father_education = 1 if father_education >= 700000 & father_education < 900000
replace high_father_education = father_education if sysmiss(father_education)

destring mother_education
generate high_mother_education = 0
replace high_mother_education = 1 if mother_education >= 700000 & mother_education < 900000
replace high_mother_education = mother_education if sysmiss(mother_education)

summarize income father_income mother_income
histogram father_income, percent
histogram mother_income, percent
correlate father_income mother_income
tabulate high_father_education high_mother_education

regress income male father_income mother_income high_education high_father_education high_mother_education


textblock
4) Linking data about parents and grandparents to individual dataset
--------------------------------------------------------------------
endblock

// Create an individual dataset with links to father and mother
create-dataset individual_data4
import db/INNTEKT_WYRKINNT 2019-01-01 as income
import db/BEFOLKNING_KJOENN as gender
import db/NUDB_BU 2019-01-01 as education
import db/BEFOLKNING_FAR_FNR as father_id
import db/BEFOLKNING_MOR_FNR as mother_id

// Get information about parents and link to individual dataset
create-dataset parents4
import db/INNTEKT_WYRKINNT 2019-01-01 as father_income
import db/NUDB_BU 2019-01-01 as father_education
clone-variables father_income -> mother_income
clone-variables father_education -> mother_education

import db/BEFOLKNING_FAR_FNR as paternal_grandfather_id
import db/BEFOLKNING_FAR_FNR as maternal_grandfather_id
import db/BEFOLKNING_MOR_FNR as paternal_grandmother_id
import db/BEFOLKNING_MOR_FNR as maternal_grandmother_id

merge father_income father_education paternal_grandfather_id paternal_grandmother_id into individual_data4 on father_id
merge mother_income mother_education maternal_grandfather_id maternal_grandmother_id into individual_data4 on mother_id

create-dataset grandparents4
import db/INNTEKT_WYRKINNT 2019-01-01 as paternal_grandfather_income
import db/NUDB_BU 2019-01-01 as paternal_grandfather_education
clone-variables paternal_grandfather_income -> paternal_grandmother_income
clone-variables paternal_grandfather_income -> maternal_grandfather_income
clone-variables paternal_grandfather_income -> maternal_grandmother_income
clone-variables paternal_grandfather_education -> paternal_grandmother_education
clone-variables paternal_grandfather_education -> maternal_grandfather_education
clone-variables paternal_grandfather_education -> maternal_grandmother_education

merge paternal_grandfather_income paternal_grandfather_education into individual_data4 on paternal_grandfather_id
merge paternal_grandmother_income paternal_grandmother_education into individual_data4 on paternal_grandmother_id
merge maternal_grandfather_income maternal_grandfather_education into individual_data4 on maternal_grandfather_id
merge maternal_grandmother_income maternal_grandmother_education into individual_data4 on maternal_grandmother_id

use individual_data4
summarize income father_income mother_income paternal_grandfather_income paternal_grandmother_income maternal_grandfather_income maternal_grandmother_income
barchart(mean) income father_income mother_income paternal_grandfather_income paternal_grandmother_income maternal_grandfather_income maternal_grandmother_income


textblock
5) Construct individual dataset consisting of spouses
----------------------------------------------------
endblock

// Create a dataset consisting of the oldest person in each family
create-dataset oldest5
import db/BEFOLKNING_REGSTAT_FAMNR 2021-01-01 as family_number
import db/BEFOLKNING_REGSTAT_PERSONKODE 2021-01-01 as oldest_person_code
import db/BEFOLKNING_PARSTATUS 2021-01-01 as oldest_partner_status
import db/REGSYS_ARB_YRKE_STYRK08 2020-11-16 as oldest_job
keep if oldest_person_code == '1'

// Create a dataset with the youngest person in each family, and who is not a child
create-dataset youngest5
import db/BEFOLKNING_REGSTAT_FAMNR 2021-01-01 as family_number
import db/BEFOLKNING_REGSTAT_PERSONKODE 2021-01-01 as youngest_person_code
import db/BEFOLKNING_PARSTATUS 2021-01-01 as youngest_partner_status
import db/REGSYS_ARB_YRKE_STYRK08 2020-11-16 as youngest_job
keep if youngest_person_code == '2'

// Use the dataset with the oldest people and merge corresponding variables into the dataset with the youngest people via the family number (family number = oldest person's id number)
use oldest5
merge oldest_person_code oldest_partner_status oldest_job into youngest5 on family_number 

// The dataset youngest now contains data about both spouses/partners for all couples in the population. Run control tables
use youngest5
tabulate youngest_person_code, missing
tabulate oldest_person_code, missing
tabulate youngest_partner_status, missing
tabulate oldest_partner_status, missing
tabulate oldest_job, missing
tabulate youngest_job, missing

// Check if both, one or neither of the spouses/partners are in a job
generate youngest_job = 1 if sysmiss(youngest_job) == 0
generate oldest_job = 1 if sysmiss(oldest_job) == 0
tabulate oldest_job youngest_job, missing